NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mirage: A Multi-Level Superoptimizer for Tensor Programs

Wu, Mengdi; Cheng, Xinhao; Liu, Shengyu; Shi, Chunan; Ji, Jianan; Ao, Kit; Velliengiri, Praveen; Miao, Xupeng; Padon, Oded; Jia, Zhihao (July 2025, OSDI)

Free, publicly-accessible full text available July 3, 2026
Mirage: A Multi-Level Superoptimizer for Tensor Programs

Wu, Mengd; Cheng, Xinhao; Liu, Shengyu; Shi, Chunan; Ji, Jianan; Ao, Man Kit; Velliengiri, Praveen; Miao, Xupeng; Padon, Oded; Jia, Zhihao (July 2025, USENIX)

Free, publicly-accessible full text available July 7, 2026
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow

Mei, Yixuan; Zhuang, Yonghao; Miao, Xupeng; Yang, Juncheng; Jia, Zhihao; Vinayak, Rashmi (March 2025, Association for Computing Machinery)

Free, publicly-accessible full text available March 30, 2026
Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs

https://doi.org/10.1109/SC41406.2024.00087

Xu, Mingkuan; Cao, Shiyi; Miao, Xupeng; Acar, Umut A; Jia, Zhihao (November 2024, IEEE)

Full Text Available
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

https://doi.org/10.1145/3669940.3707220

Jeon, Byungsoo; Wu, Mengdi; Cao, Shiyi; Kim, Sunghyun; Park, Sunghyun; Aggarwal, Neeraj; Unger, Colin; Arfeen, Daiyaan; Liao, Peiyuan; Miao, Xupeng; et al (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
Quanto: optimizing quantum circuits with automatic generation of circuit identities

https://doi.org/10.1088/2058-9565/ad5b16

Pointing, Jessica; Padon, Oded; Jia, Zhihao; Ma, Henry; Hirth, Auguste; Palsberg, Jens; Aiken, Alex (July 2024, Quantum Science and Technology)

Abstract Existing quantum compilers focus on mapping a logical quantum circuit to a quantum device and its native quantum gates. Only simple circuit identities are used to optimize the quantum circuit during the compilation process. This approach misses more complex circuit identities, which could be used to optimize the quantum circuit further. We propose Quanto, the first quantum optimizer that automatically generates circuit identities. Quanto takes as input a gate set and generates provably correct circuit identities for the gate set. Quanto’s automatic generation of circuit identities includes single-qubit and two-qubit gates, which leads to a new database of circuit identities, some of which are novel to the best of our knowledge. In addition to the generation of new circuit identities, Quanto’s optimizer applies such circuit identities to quantum circuits and finds optimized quantum circuits that have not been discovered by other quantum compilers, including IBM Qiskit and Cambridge Quantum Computing Tket. Quanto’s database of circuit identities could be applied to improve existing quantum compilers and Quanto can be used to generate identity databases for new gate sets.
more » « less
Full Text Available
SpotServe: Serving Generative Large Language Models on Preemptible Instances

https://doi.org/10.1145/3620665.3640411

Miao, Xupeng; Shi, Chunan; Duan, Jiangfei; Xi, Xiaoli; Lin, Dahua; Cui, Bin; Jia, Zhihao (April 2024, ACM)

Full Text Available
Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances

Song, Ziang; Miao, Xupeng; Xi, Xiaoli; Lin, Dahua; Xu, Harry; Zhang, Minjia; Jia, Zhihao (April 2024, USENIX Association)

Full Text Available
Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances

Duan, Jiangfei; Song, Ziang; Miao, Xupeng; Xi, Xiaoli; Lin, Dahua; Xu, Harry; Zhang, Minjia; Jia, Zhihao (April 2024, USENIX)

Full Text Available
Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

Subramanya, Suhas Jayaram; Arfeen, Daiyaan; Lin, Shouxu; Qiao, Aurick; Jia, Zhihao; Ganger, Gregory R. (October 2023, SOSP)

The Sia1 scheduler efficiently assigns heterogeneous deep learning (DL) cluster resources to elastic resource-adaptive jobs. Although some recent schedulers address one aspect or another (e.g., heterogeneity or resource-adaptivity), none addresses all and most scale poorly to large clusters and/or heavy workloads even without the full complexity of the combined scheduling problem. Sia introduces a new scheduling formulation that can scale to the search-space sizes and intentionally match jobs and their configurations to GPU types and counts, while adapting to changes in cluster load and job mix over time. Sia also introduces a low- profiling-overhead approach to bootstrapping (for each new job) throughput models used to evaluate possible resource assignments, and it is the first cluster scheduler to support elastic scaling of hybrid parallel jobs. Extensive evaluations show that Sia outperforms state-of- the-art schedulers. For example, even on relatively small 44- to 64-GPU clusters with a mix of three GPU types, Sia reduces average job completion time ( JCT) by 30–93%, 99th percentile JCT and makespan by 28–95%, and GPU hours used by 12– 55% for workloads derived from 3 real-world environments. Additional experiments demonstrate that Sia scales to at least 2000-GPU clusters, provides improved fairness, and is not over-sensitive to scheduler parameter settings.
more » « less
Full Text Available

« Prev Next »

Search for: All records